Missing data estimation in morphometrics: how much is too much?

نویسندگان

  • Julien Clavel
  • Gildas Merceron
  • Gilles Escarguel
چکیده

Fossil-based estimates of diversity and evolutionary dynamics mainly rely on the study of morphological variation. Unfortunately, organism remains are often altered by post-mortem taphonomic processes such as weathering or distortion. Such a loss of information often prevents quantitative multivariate description and statistically-controlled comparisons of extinct species based on morphometric data. A common way to deal with missing data involves imputation methods that directly fill the missing cases with model estimates. Over the last years, several empirically-determined thresholds for the maximum acceptable proportion of missing values have been proposed in the literature, whereas other studies showed that this limit actually depends on various properties of the study data set and of the selected imputation method, and is by no way generalizable. We evaluate the relative performances of seven multiple imputation (MI) techniques through a simulation-based analysis under three distinct patterns of missing data distribution. Overall, Fully Conditional Specification and Expectation-Maximization algorithms provide the best compromises between imputation accuracy and coverage probability. MI techniques appear remarkably robust to the violation of basic assumptions such as the occurrence of taxonomically or anatomically biased patterns of missing data distribution, making differences in simulation results between the three patterns of missing data distribution much smaller than differences between the individual MI techniques. Based on these results, rather than proposing a new (set of) threshold value(s), we develop an approach combining the use of MIs with procrustean superimposition of principal component analysis results, in order to directly visualize the effect of individual missing data imputation on an ordinated space. We provide an R function for users to implement the proposed procedure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Development of Maximum Likelihood Estimation Approaches for Adaptive Estimation of Free Speed and Critical Density in Vehicle Freeways

The performance of many traffic control strategies depends on how much the traffic flow models have been accurately calibrated. One of the most applicable traffic flow model in traffic control and management is LWR or METANET model. Practically, key parameters in LWR model, including free flow speed and critical density, are parameterized using flow and speed measurements gathered by inductive ...

متن کامل

Optimal Aminoglycoside Therapy Following the Sepsis: How Much Is Too Much?

Severe sepsis and septic shock are major problems as the result of high rates morbidity andmortality in intensive care units (ICUs). In the presence of septic shock, each hour of delay inthe administration of effective antibiotics is associated with a measurable increase in mortality.Aminoglycosides are effective broad-spectrum antibiotics that are commonly used in ICUs forthe treatment of life...

متن کامل

Optimal Aminoglycoside Therapy Following the Sepsis: How Much Is Too Much?

Severe sepsis and septic shock are major problems as the result of high rates morbidity andmortality in intensive care units (ICUs). In the presence of septic shock, each hour of delay inthe administration of effective antibiotics is associated with a measurable increase in mortality.Aminoglycosides are effective broad-spectrum antibiotics that are commonly used in ICUs forthe treatment of life...

متن کامل

The Development of Maximum Likelihood Estimation Approaches for Adaptive Estimation of Free Speed and Critical Density in Vehicle Freeways

The performance of many traffic control strategies depends on how much the traffic flow models are accurately calibrated. One of the most applicable traffic flow model in traffic control and management is LWR or METANET model. Practically, key parameters in LWR model, including free flow speed and critical density, are parameterized using flow and speed measurements gathered by inductive loop d...

متن کامل

Estimation of Climate Zone Effects on Iranian Temperature, Humidity, and Precipitation using Functional Analysis of Covariance

Functional Data Analysis (FDA) has recently made considerable progress because of easier access to the data that are essentially in the form of curves. Although functional modeling of Iranian precipitation based on temperature or humidity was done before, here we use functional analysis of variance and covariance to analyze the weather data collected randomly from Iranian weather stations in 20...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Systematic biology

دوره 63 2  شماره 

صفحات  -

تاریخ انتشار 2014